GPU Erasure Coding for Campaign Storage
نویسندگان
چکیده
High-performance computing (HPC) demands high bandwidth and low latency in I/O performance leading to the development of storage systems and I/O software components that strive to provide greater and greater performance. However, capital and energy budgets along with increasing storage capacity requirements have motivated the search for lower cost, large storage systems for HPC. With Burst Buffer technology increasing the bandwidth and reducing the latency for I/O between the compute and storage systems, the back-end storage bandwidth and latency requirements can be reduced, especially underneath an adequately sized modern parallel file system. Cloud computing has led to the development of large, low-cost storage solutions where design has focused on high capacity, availability, and low energy consumption at lowest cost. Cloud computing storage systems leverage duplicates and erasure coding technology to provide high availability at much lower cost than traditional HPC storage systems. Leveraging certain cloud storage infrastructure and concepts in HPC would be valuable economically in terms of cost-effective performance for certain storage tiers. To enable the use of cloud storage technologies for HPC we study the architecture for interfacing cloud storage between the HPC parallel file systems and the archive storage. In this paper, we report our comparison of two erasure coding implementations for the Ceph file system. We compare measurements of various degrees of sharding that are relevant for HPC applications. We show that the Gibraltar GPU Erasure coding library outperforms a CPU implementation of an erasure coding plugin for the Ceph object storage system, opening the potential for new ways to architect such storage systems based on Ceph. 4 The literature uses the term “stripe” for a set of data that is protected by RAID or erasure coding implementation. The stripe is divided into kdata chunks and protected by m parity or coding chunks. In this paper, the term “strip” and “shard” are used synonymously and refer to these chunks.
منابع مشابه
IStore: Towards High Efficiency, Performance, and Reliability in Distributed Data Storage with Information Dispersal Algorithms
Reliability is one of the major challenges for high performance computing and cloud computing. Data replication is a commonly used mechanism to achieve high reliability. Unfortunately, it has a low storage efficiency among other shortcomings. As an alternative to data replication, information dispersal algorithms offer higher storage efficiency, but at the cost of being too computing-intensive ...
متن کاملA Non-MDS Erasure Code Scheme for Storage Applications
This paper investigates the use of redundancy and self repairing against node failures indistributed storage systems using a novel non-MDS erasure code. In replication method, accessto one replication node is adequate to reconstruct a lost node, while in MDS erasure codedsystems which are optimal in terms of redundancy-reliability tradeoff, a single node failure isrepaired after recovering the ...
متن کاملParallel coding for storage systems - An OpenMP and OpenCL capable framework
Parallel storage systems distribute data onto several devices. This allows high access bandwidth that is needed for parallel computing systems. It also improves the storage reliability, provided erasure-tolerant coding is applied and the coding is fast enough. In this paper we assume storage systems that apply data distribution and coding in a combined way. We describe, how coding can be done p...
متن کاملOn repairing erasure coded data in an active-passive mixed storage network
Citation Oggier, F., & Datta, A. (2015). On repairing erasure coded data in an active-passive mixed storage network. International journal on information and coding theory, 3(1). Abstract: A major change has been recently witnessed in networked distributed storage systems (NDSS), with increased use of erasure codes in lieu of replication for realizing data redundancy. Yet, both the industry and...
متن کاملEvaluating Information Dispersal Algorithms
The explosion in data acquisition and storage has led to the emergence of data-intensive applications that are used to process enormous quantity of information using methods such as the MapReduce paradigm. Data-Intensive Distributed File Systems (DI-DFS) have been designed to support these kinds of applications. These large-scale storage systems require faulttolerance mechanisms to handle failu...
متن کامل